Objective: Automatic text summarization tools can help users in thebiomedical domain to access information efficiently from a large volume ofscientific literature and other sources of text documents. In this paper, wepropose a summarization method that combines itemset mining and domainknowledge to construct a concept-based model and to extract the main subtopicsfrom an input document. Our summarizer quantifies the informativeness of eachsentence using the support values of itemsets appearing in the sentence.Methods: To address the concept-level analysis of text, our method initiallymaps the original document to biomedical concepts using the UMLS. Then, itdiscovers the essential subtopics of the text using a data mining technique,namely itemset mining, and constructs the summarization model. The employeditemset mining algorithm extracts a set of frequent itemsets containingcorrelated and recurrent concepts of the input document. The summarizer selectsthe most related and informative sentences and generates the final summary.Results: We evaluate the performance of our itemset-based summarizer using theRecall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, performing aset of experiments. The results show that the itemset-based summarizer performsbetter than the compared methods. The itemset-based summarizer achieves thebest scores for all the assessed ROUGE metrics . Conclusion: Compared to thestatistical, similarity, and word frequency methods, the proposed methoddemonstrates that the summarization model obtained from the concept extractionand itemset mining provides the summarizer with an effective metric formeasuring the informative content of sentences. This can lead to an improvementin the performance of biomedical literature summarization.
展开▼